Calibrating Resource-light Automatic MT Evaluation: a Cheap Approach to Ranking MT Systems by the Usability of Their Output

نویسندگان

  • Bogdan Babych
  • Debbie Elliott
  • Anthony Hartley
چکیده

MT systems are traditionally evaluated with different criteria, such as adequacy and fluency. Automatic evaluation scores are designed to match these quality parameters. In this paper we introduce a novel parameter – usability (or utility) of output, which was found to integrate both fluency and adequacy. We confronted two automated metrics, BLEU and LTV, with new data for which human evaluation scores were also produced; we then measured the agreement between the automated and human evaluation scores. The resources produced in the experiment are available on the authors’ website.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Calibrating resource-light automatic MT evaluation

MT systems are traditionally evaluated with different criteria, such as adequacy and fluency. Automatic evaluation scores are designed to match these quality parameters. In this paper we introduce a novel parameter – usability (or utility) of output, which was found to integrate both fluency and adequacy. We confronted two automated metrics, BLEU and LTV, with new data for which human evaluatio...

متن کامل

Statistical modelling of MT output corpora for Information Extraction

The output of state-of-the-art machine translation (MT) systems could be useful for certain NLP tasks, such as Information Extraction (IE). However, some unresolved problems in MT technology could seriously limit the usability of such systems. For example robust and accurate word sense disambiguation, which is essential for the performance of IE systems, is not yet achieved by commercial MT app...

متن کامل

Ranking vs. Regression in Machine Translation Evaluation

Automatic evaluation of machine translation (MT) systems is an important research topic for the advancement of MT technology. Most automatic evaluation methods proposed to date are score-based: they compute scores that represent translation quality, and MT systems are compared on the basis of these scores. We advocate an alternative perspective of automatic MT evaluation based on ranking. Inste...

متن کامل

Unobtrusive methods for low-cost manual evaluation of machine translation

Machine translation (MT) evaluation metrics based on n-gram co-occurrence statistics are financially cheap to execute and their value in comparative research is well documented. However, their value as a standalone measure of MT output quality is questionable. In contrast, manual methods of MT evaluation are financially expensive. This paper will present early research being carried out within ...

متن کامل

Methods for human evaluation of machine translation

Evaluation of machine translation (MT) is a difficult task, both for humans, and using automatic metrics. The main difficulty lies in the fact that there is not one single correct translation, but many alternative good translation options. MT systems are often evaluated using automatic metrics, which commonly rely on comparing a translation to only a single human reference translation. An alter...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004